Managing and Analyzing Next-Generation Sequence Data
نویسندگان
چکیده
Centralized Bioinformatics Core Facilities provide shared resources for the computational and IT requirements of the investigators in their department or institution. As such, they must be able to effectively react to new types of experimental technology. Recently faced with an unprecedented flood of data generated by the next generation of DNA sequencers, these groups found it necessary to respond quickly and efficiently to the informatics and infrastructure demands. Centralized Facilities newly facing this challenge need to anticipate time and design considerations of necessary components, including infrastructure upgrades, staffing, and tools for data analyses and management. The evolution of the sequencing instrumentation is far from static. Sequence throughput from this new generation of instruments continues to increase exponentially at the same time that the cost of sequencing a genome continues to fall. These realities make the technology accessible to greater numbers of investigators while leading them to a greater usage of sequencing for a variety of experimental techniques, including variation discovery, whole transcriptome analysis, and DNA– protein interaction analysis. This places unique challenges upon the Bioinformatics Core Facility, whose mission could vary from the support of a single department or sequencing core to a Facility that supports many disparate and independent groups that run their own sequencers but rely on the Central Facility to host the informatics, research cyberinfrastructures, or both. It is worth noting that the initial investment in the instrument is accompanied by an almost equal investment in upgrading the informatics infrastructure of the institution, hiring staff to analyze the data produced by the instrument, and storing the data for future use. Many investigators do not realize that these extensive investments are necessary prior to purchasing the new technology. This is why it is advantageous to have a centralized Bioinformatics Core to put in place platforms that acquire, store, and analyze the very large datasets created by these instruments. A Bioinformatics Core, already familiar with data of this type and complexity, dedicated to investigators, and jointly working with IT personnel, can span multiple domains rather effortlessly. The large sequencing centers (e.g., Sanger, Broad Institute, and Washington University) have automated processes and architectures not generally replicable in medium and small sequencing groups. However, as these smaller groups obtain next-generation technology they can nevertheless learn lessons from the larger centers. Through collaboration and sharing best practices, small and medium-sized groups can prepare for the arrival of the technology and develop methods to manage and analyze the data. The BioinfoCore Special Interest Group [1], affiliated with the International Society for Computational Biology, has been actively collaborating to formulate best practices to assist small and medium-sized Cores in setting up platforms for next-generation sequencing. Here, we provide a Perspective for such a Core Facility in accomplishing this task, using collective experiences from Facilities that have solved many of these issues.
منابع مشابه
Strategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملAnalysis of Genomic Data 1
The deliverable is organized in four sections: the first section concerns the analysis of genomic data. It presents the main contributions of the GenData project to the analysis and annotation of the raw sequences produced by Next-Generation Sequencing experiments. The second section presents a framework for managing and analyzing genomic data by combining OLAP analysis and Data Mining. The thi...
متن کاملAnalysis of Metagenomics Next Generation Sequence Data for Fungal ITS Barcoding: Do You Need Advance Bioinformatics Experience?
During the last few decades, most of microbiology laboratories have become familiar in analyzing Sanger sequence data for ITS barcoding. However, with the availability of next-generation sequencing platforms in many centers, it has become important for medical mycologists to know how to make sense of the massive sequence data generated by these new sequencing technologies. In many reference lab...
متن کاملAnalyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks
High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example...
متن کاملIntegrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools
MOTIVATION Storing, annotating and analyzing variants from next-generation sequencing projects can be difficult due to the availability of a wide array of data formats, tools and annotation sources, as well as the sheer size of the data files. Useful tools, including the GATK, ANNOVAR and BEDTools can be integrated into custom pipelines for annotating and analyzing sequence variants. However, b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS Computational Biology
دوره 5 شماره
صفحات -
تاریخ انتشار 2009